Introduction
The representation of women in media has long been a topic of interest, as it reflects societal norms and attitudes towards gender equality. Despite the progress made in recent decades towards gender equality, it is important to examine whether these changes are reflected in the films we watch. Movies provide a unique insight into the subconscious ways in which society is conditioned to view women, and can capture the ideals and norms of the time in which they were produced.
In this data analysis project, we will use the CMU Movie Summary Corpus dataset, as well as additional datasets from Stanford CoreNLP, IMDb, Wikidata, IMDB, and Box Office Mojo, to explore the portrayal of women in film. This includes examining the roles of actresses, characters, and writers and directors. By analyzing these factors, we aim to gain a deeper understanding of how women have been depicted in media over time and how this representation may have evolved. Our analysis will also allow us to consider the ways in which society views and treats women and the progress made towards gender equality.
The Data
Our analysis is based on merging the CMU Dataset, the CMU Dataset, the Stanford CoreNLP-processed summaries, IMDb, Wikidata, IMDB and Box office Mojo. We have separated the data into three tables: the movies table, the characters table, and the directors and writers table.
- The movies table contains titles, release year, runtime, box office revenue, average rating and number of votes on IMDb, genre, as well as the list of directors and writers. There are a total of 81,741 different movies.
- The characters table contains the ID of the movie, the character name and actor name, their height, ethnicity, birth and death year, the movie metric and the actor metric. There are 450,669 characters played by 135,761 different actors.
- The directors and writers table contains titles, role (either director or writer), name, gender, birth year, and height. There are a total of 86,474 directors and 164,271 writers.
The Impact Score metric
Movies
We have created a metric in order to measure the impact of a movie on the average rating and the number of votes. Our assumption is that an impactful movie has a lot of votes and has either an extremely good or bad average rating.
We apply a logarithmic transformation to the number of votes in order to normalize the data and accurately compare the impact of different movies. We then take the absolute value of the normalized average rating for each movie. This accounts for both very good and very bad movies, as both have a significant impact on audience reception. By combining these two factors, we are able to calculate the overall impact a movie has on its audience and compare this across different films.
\[\textrm{Impact Score}_\textrm{Movies} = \textrm{normalized} (\log(\textrm{number of votes})) \cdot \textrm{abs}(\textrm{normalized}(\textrm{IMDB rating}))\]According to this metric, those are the top 10 most impactful movies of our dataset:
| title | average rating | number of votes | impact score |
|---|---|---|---|
| The Shawshank Redemption | 9.3 | 2648879 | 9.90 |
| The Dark Knight | 9.0 | 2620838 | 8.91 |
| Inception | 8.8 | 2322848 | 8.15 |
| Fight Club | 8.8 | 2093849 | 8.05 |
| Forrest Gump | 8.8 | 2051278 | 8.03 |
| Pulp Fiction | 8.9 | 2027513 | 8.33 |
| The Matrix | 8.7 | 1894094 | 7.64 |
| The Lord of the Rings: The Fellowship of the Ring | 8.8 | 1851387 | 7.93 |
| The Godfather | 9.2 | 1836155 | 9.16 |
| The Lord of the Rings: The Return of the King | 9.0 | 1824685 | 8.53 |
Actors, writers and directors
For actors, writers, and directors, we use the Discounted Cumulative Gain to rank the movies they are linked to according to the impact score and compute their overall impact.
\[\textrm{Impact Score}_\textrm{Actors, Directors, Writers} = \sum_{i=1}^{\textrm{number of movies}}\frac{\textrm{movie metric}_i}{\log_2(i + 1)}\]Here are the top 10 actors, writers and directors with the highest impact score:
| actors | directors | writers | |||
|---|---|---|---|---|---|
| name | impact score | name | impact score | name | impact score |
| Samuel L. Jackson | 47.28 | Steven Spielberg | 35.52 | Stephen King | 35.70 |
| Robert De Niro | 45.92 | Martin Scorsese | 34.01 | George Lucas | 29.18 |
| Michael Caine | 42.68 | Alfred Hitchcock | 30.92 | Christopher Nolan | 29.14 |
| Morgan Freeman | 42.38 | Christopher Nolan | 29.12 | Bob Kane | 28.51 |
| Al Pacino | 39.39 | Francis Ford Coppola | 27.79 | Quentin Tarantino | 27.30 |
| Bruce Willis | 38.88 | Quentin Tarantino | 26.34 | Francis Ford Coppola | 26.90 |
| Gary Oldman | 37.17 | Akira Kurosawa | 24.82 | Akira Kurosawa | 26.66 |
| Robert Duvall | 36.77 | Stanley Kubrick | 24.71 | David S. Goyer | 25.18 |
| Tom Hanks | 36.71 | Clint Eastwood | 23.37 | Billy Wilder | 24.22 |
| Brad Pitt | 36.55 | Uwe Boll | 22.24 | Hayao Miyazaki | 24.02 |
Where are the Women?
As our project focuses on the representation of women in movies, it can be interesting to look at the percentage of female actresses per genre per decades.
When it comes to genre, women are most often represented in dramas, comedies and romances, while they are underrepresented in action adventure and sci-fi films.
When considering the representation of women among directors and writers, we found that the pattern is similar, although the overall percentage of women in these roles is significantly lower than for actresses.
Behind the Camera
Put the director analysis here. Who are the most impactful directors for the most common genres? What are their best movies? What are they about? how does the impact of female directors compare with male directors?
In Front of the Camera
Put the Actor analysis here
On the Screen
Put the character analysis here
Women of Impact
Despite the challenges facing women in the film industry, there are many women who have made a significant impact and achieved great success in their roles.
These women have not only excelled in their careers, but have also challenged stereotypes and paved the way for future generations of women in media.
That’s a Wrap!
In conclusion, the representation of women in media is limited and often stereotypical. However, there are many talented and successful women in the industry who are making a significant impact. It is important for the industry to continue to strive for greater diversity and representation, in order to create a more accurate and fair portrayal of women in media.